Comments about the article in Nature: ChatGPT: five priorities for research

Following is a discussion about this article in Nature Vol 614 9 February 2023, by Eva A.M. van Dis, Johan Bollen e.a.
To study the full text select this link: https://www.nature.com/articles/d41586-023-00288-7 In the last paragraph I explain my own opinion.

Contents

Reflection


Introduction

ChatGPT is a large language model (LLM), a machine-learning system that autonomously learns from data and can produce sophisticated and seemingly intelligent writing after training on a massive data set of text.
The question is insofar there is any learning involved.
The question is if you ask a question to ChatGPT, insofar ChatGPT learns something. Or is it a question when new ChatGPT reads more data, he learns.
A different question is to what extend the output is intelligent, which is a function of the data feed into the computer.
Soon this technology will evolve to the point that it can design experiments, write and complete manuscripts, conduct peer review and support editorial decisions to accept or reject manuscripts.
The question is to what extend ChatGPT can actual make (automatic) decisions without any manual interaction. This is the same type of decisions an automatic pilot makes, from take off to landing.
However, it could also degrade the quality and transparency of research and fundamentally alter our autonomy as human researchers.
To prevent that the whole system requires thoroughly testing, based on actual cases. This is very time consuming, but extremely important. To prevent that the reasoning behind the output generated, should be fully explained.
It is imperative that the research community engage in a debate about the implications of this potentially disruptive technology.
It is imperative that the research community should engage in a debate about all possible implications.

1. Hold on human verification

But using conversational AI for specialized research is likely to introduce inaccuracies, bias and plagiarism.
In thruth the whole concept behind ChatGPT is based around plagiarism. To solve that all sources used should be indicated. If content is copied from Wikipedia, this should be mentioned.
If conversational AI is really clever than all the inaccuracies in the source documents used should be detected. If there are inaccuracies, the assumption is that in most cases, the source documents more or less contain two opinions: one that is correct and one that is wrong. If 90% is correct there is no problem. If the result is fifty - fifty than there is a problem, because how does program decide which is correct? To do that the program must in detail evaluate what the differences between both opinions are. That is very difficult. You are lucky if there are documents which discuss both opinions and all have the same conclusion.
What is the definition of bias? If you ask a communist about the advantages of communism, a socialist about socialism, a liberal about liberalisme, a republican, a democrate etc most probably all answers are biased, specific if in each case a different country is used. Based on that information, to write an objective story by a human is very difficult.
For example, when we asked 'how many patients with depression experience relapse after treatment?', it generated an overly general text arguing that treatment effects are typically long-lasting. However, numerous high-quality studies show that treatment effects wane and that the risk of relapse ranges from 29% to 51% in the first year after treatment completion(2–4). Repeating the same query generated a more detailed and accurate answer (see Supplementary information, Figs S1 and S2) .
The two replies by ChatGPT are in Figure S1 and S2. The technical details are in the documents 2,3 and 4.
The question asked is: "How many patients with depression experience relapse after treatment" requires more thoughts. First of all you need a good definition of what is a depression, what is a relapse and what is considered a treatment. Are the documents 2,3 and 4 treated as the training data base and what should be the correct answer. The authors at least should discuss these issues.
ChatGPT fabricated a convincing response that contained several factual errors, misrepresentations and wrong data (see Supplementary information, Fig. S3). For example, it said the review was based on 46 studies (it was actually based on 69) and, more worryingly, it exaggerated the effectiveness of CBT.
When you read the actual document, ref 5, you can read that the correct (?) answer is 69.
Such errors could be due to (1) an absence of the relevant articles in ChatGPT’s training set, (2) a failure to distil the relevant information or (3) being unable to distinguish between credible and less-credible sources.
Based on my assumptions:
Researchers who use ChatGPT risk being misled by false or biased information, and incorporating it into their thinking and papers.
That problem is not particular for ChatGPT. First of all what is biased information and how do we know that something is biased. If the opinion of an article is not the same as mine opinion, I could call it biased. What is false information? False information is a simpler issue as biased information because most often it is less personal and more scientific. In that sense ChatGPT should as much stay with the facts.
And, because this technology typically reproduces text without reliably citing the original sources or authors, researchers using it are at risk of not giving credit to earlier work, unwittingly plagiarizing a multitude of unknown texts and perhaps even giving away their own ideas.
Yes, that problem exists.
To prevent human automation bias — an over-reliance on automated systems — it will become even more crucial to emphasize the importance of accountability.
By who? If a human writes any article, he or she is accountable for its context. The assumption is that it assumes the truth. How ever this issue is not so simple. It is possible that I don't supply all the information, to lead the reader in a certain direction. This is more often the case when people are paid in order to receive a certain goal or for political reasons.
We think that humans should always remain accountable for scientific practice.
Excatly the same is with ChatGPT. ChatGPT is responsible, accountable, if you have to pay.

2. Develop rules for accountability

Author-contribution statements and acknowledgements in research papers should state clearly and specifically whether, and to what extent, the authors used AI technologies such as ChatGPT in the preparation of their manuscript and analysis.
The code of conduct should be that all the authors of a research papers are responsible for the content. As a consequence research papers should not use ChatGPT. If they do than the question asked should be mentioned. Like this article does.
Research papers, like this article, can use Google. For example in this article there are 12 References. All these articles could be found by doing a search with Google.

They should also indicate which LLMs were used.
Using the above code of conduct this is not necessary.
For now, LLMs should not be authors of manuscripts because they cannot be held accountable for their work.
Text generated by a LLM can never be considered the author of a manuscript, because always humans are involved.
But, it might be increasingly difficult for researchers to pinpoint the exact role of LLMs in their studies.
That is not important because the researchers are responsible for their studies.
In some cases, technologies such as ChatGPT might generate significant portions of a manuscript in response to an author’s prompts.
Again also this is not important because it are these authors who raise the question in the form of prompts.

3. Invest in truly open LLMs

To counter this opacity, the development and implementation of open-source AI technology should be prioritized.

4. Embrace the benefits of AI

Some argue that because chatbots merely learn statistical associations between words in their training set, rather than understand their meanings, LLMs will only ever be able to recall and synthesize what people have already done and not exhibit human aspects of the scientific process, such as creative and conceptual thought.
That is correct. True understanding implemented/expressed in a computer program, is extremely difficult.
We argue that this is a premature assumption, and that future AI-tools might be able to master aspects of the scientific process that seem out of reach today.
If that is the case than you should also outlay how you suggest how you do that.
To claim that "It might be possible to master aspects of the scientific process that are/seem out of reach today" is too general.
In a 1991 seminal paper, researchers wrote that “intelligent partnerships” between people and intelligent technology can outperform the intellectual ability of people alone(11).
That is correct. My understanding is that there exists some sort of upward spiral. Humans improve the tools they use. The improved tools result in more accurate experiments. As a result humans will make new tools or improve existing tools. etc. etc.
The most important aspect of this cycle is human intellect.
See: Partners in Cognition: Extending Human Intelligence with Intelligent Technologies
These intelligent partnerships could exceed human abilities and accelerate innovation to previously unthinkable levels.
No this wil never happen. As said before, the only intelligency is the human intellect.
The question is how far can and should automation go?
That is an important issue. The answer lies completely in the hands of humans. They bear the responsabilty.
AI technology might rebalance the academic skill set.
This is only possible if AI technology understands discussions at academic level.
At research level Google is a very important search tool. This requires no understanding from the part of Google.
On the one hand, AI could optimize academic training — for example, by providing feedback to improve student writing and reasoning skills.
The reasoning part is the most difficult part. This is rather easy if the reasoning part is already available in the database. In that case AI training becomes almost: search, find, and a copy paste operation.
In the future, AI chatbots might generate hypotheses, develop methodology, create experiments(12), analyse and interpret data and write manuscripts.
That is only possible if chatbots fully understand the written text.
In place of human editors and reviewers, AI chatbots could evaluate and review the articles, too.
That is only possible if chatbots fully understand the written text.

5. Widen the debate

Given the disruptive potential of LLMs, the research community needs to organize an urgent and wide-ranging debate.
The research community should research how intelligent AI realy is.
First, we recommend that every research group immediately has a meeting to discuss and try ChatGPT for themselves (if they haven’t already).
They will discover that to have a real discussions, with an exchange of new ideas, questions, suggestions and answers is impossible.


Reflection 1 - Should the research community use large language models (LLM's) ?

The question is to what extend the research community should use software package's.
The answer is rather simple, when you want to use any software package, you need practical experience or some one within your organization.
This also means that the software package needs a demonstration package and excellent help functionality. Using this demonstration package you should setup a case you know. That means the result of the demonstration should be as expected. If you make small modifications the output again should be as expected.
For example:


Reflection 2


If you want to give a comment you can use the following form Comment form


Created: 20 December 2022

Back to my home page Index
Back to Nature comments Nature Index